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Abstract 

The time evolution of a simple model for crossover is discussed. A variant of 
this model with an improved exploration behavior in phase space is derived as a 
subset of standard one- and multi-point crossover operations. This model is solved 
analytically in the flat fitness case. Numerical simulations compare the way of 
phase space exploration of different genetic operators. In the case of a non-fiat 
fitness landscape, numerical solutions of the evolution equations point out ways to 
estimate premature convergence. 



During the last decade, genetic algorithms [1, 2] have advanced to very powerful op- 
timization tools with real world applications in many different fields [3]. The theoretical 
understanding of these algorithms, however, has not kept pace with this development. Af- 
ter a brief overview of the different approaches to an understanding of genetic algorithms, 
we will present an alternative view from the perspective of statistical mechanics. 

The basic mechanism of genetic algorithms works as follows. The set of parameters 
of a given problem is coded as an N dimensional binary vector {xi} with Xi G {0,f} 
for all components i. Then the task is to find an optimal solution by looking for a 
maximum of a suitably defined fitness function in this N dimensional parameter space. 
Basic mechanisms to scan this space are mutation and crossover, followed by a selection 
step where the fittest vectors are selected. Let us consider the simple case of a fiat fitness 
landscape with fitness function f = 1. Mutation moves the vectors in this space by 
stochastic fiips of single components of the parameter vectors. It covers the space by a 
process similar to diffusion spreading out via next neighbors into the search space. A 
completely different propagating behavior is exhibited by crossover. In its simplest form, 
it takes two random vectors and swaps a certain fraction of components between them. 
The simplest version is the one-point crossover, where all bits beyond a certain crossover 
point are exchanged. The produced "offspring" typically does not belong to the close 
neighborhood of the "parents" in the phase space, s.t. crossover is able to cover a large 
search space quite fast. Unlike mutation, it does not have to suffer from the strongly 
inhomogeneous nature of the diffusion process. Finally, it may be important to notice 
that if one starts with two maximally distant vectors, crossover is able to reach any other 
point in phase space. 

The dynamics of mutation can be understood in terms of non-equilibrium statistical 
mechanics [4]. However, very little is known about the convergence and phase space 
dynamics of crossover. Due to the highly nonlinear nature of the crossover operator, a 
full calculation of the dynamics already without fitness quickly becomes complex, not to 
mention the evolution in arbitrary fitness landscapes where hardly anything quantitative 
can be said about the full time evolution. Another problem arises from the complicated 
dynamics of finite size populations. Different approaches have been taken to get a good 
understanding of how genetic algorithms work. Several limiting cases proved to be useful. 
The most general statements about the convergence of a genetic algorithm can be made 
in the limit of just one time step of the evolution. One finds inequalities about the change 
in frequency of the members in a population proving the convergence properties of genetic 
algorithms (Schema Theorem [I]), or estimates for the evolution of the mean fitness of 
a population (Price's Theorem [5]). A second approach is to explore the dynamics of a 
genetic algorithm for a specific fitness function. Functions have been studied that are 
considered to be particularly easy (royal road functions [6]) or hard (deceptive problems 
[7]) for a genetic algorithm. The time evolution of a genetic algorithm quickly becomes 
complicated, not at least due to the finite size of the populations. Approaches have 
been taken treating small populations as Markov chains [8, 9] on the one hand, and the 
infinite population limit of statistical mechanics [10, II] on the other. Results of the latter 
approach have been shown to be of importance also to the dynamics of finite populations 
[9]. In the following we proceed along the lines of this limit of statistical mechanics. 



I 



In this article we study the time evolution of infinite population models under crossover. 
After deriving a one-point crossover model for a fiat fitness landscape, we present a hier- 
archical model of crossover that is optimized for a fast and homogeneous exploration of 
the phase space. We solve the model analytically for a fiat fitness surface and give results 
of numerical studies with fiat fitness as well as a rugged fitness function of the travelling 
salesman problem. The "hierarchical crossover" operator proves to cover the phase space 
fast and - in the case of a fiat fitness - homogeneously. Furthermore, the coverage occurs 
in a true hierarchical fashion. 

First let us consider how mutation moves into phase space in the example of a simple 
spin chain. We start from a given vector {xi} with Xi = 1 for all i. In each time step let 
us fiip one of the components into the opposite state. How long does it take to reach any 
vector in the phase space from one given starting vector? The minimum time is simply 
given by the number of required spin fiips, N in this case. However, the probability to 
reach a distant state scales badly with dimension N due to the diffusion type dynamics 
of mutation [4]. 

One-point crossover exhibits a better scaling with dimension. Let us start with two 
maximally distant parent vectors {xi} with Xi = 1 and {y^} with y^ = for all 
In every step, the operator generates a new "twist" location in the spin chain pair. In 
addition, it is able to put together two parts containing possible earlier twists. Therefore, 
the minimum time t to reach any vector in the N dimensional phase space is given by 
the condition t > log2 N. However, this optimum is not efficiently implemented in the 
standard one-point crossover. The reason is that the two vector parts put together by the 
crossover operator usually did not experience the maximum number of earlier crossover 
fiips. Nevertheless, they remain in the gene pool and reach the target vector at a later 
time than the optimal combination of recombination steps. The idea of "hierarchical 
crossover" is to eliminate these sub-optimal paths of crossover evolution and just retain 
the shortest paths that lead to any target vector in the phase space. 

To be more specific we first consider the dynamics of a two dimensional string under 
crossover in the limit of a large gene pool. The genes are strings of two binary variables 
Xi and X2 with values Xi^2 £ {0,1}- The probability to draw a specific string from the 
pool is given by P'^[xi^ X2). We are interested in the probability P^[xi^X2) to find the 
string at later times t. For a crossover probability p, we obtain after the first time step: 



is the probability to find a string with a specified value of Xi. The partial probability 
P'^{xi) corresponds to the probability of a schema (xi, *) in the traditional formalism of 
genetic algorithms [1]. We can now define an operator which describes this decom- 
position of the probability of a state into partial probabilities through crossover within a 
population. Define by 



P^(X1, X2) = (1 - P)P°(X1, X2) + PP°(X1)P°(X2) 




where 





Y: C'P\xr,X2)P\yr,y2) 



P\yi.X2)P\xr,y2) 




(3) 
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which gives the probabihty for a state {xi^X2) to be produced by crossover in the time 
step t — )• t + 1. (The superscript 2 refers to breaking up the probabihty into two partial 
probabihties, the only way for N = 2). This accounts for the fact that the crossover oper- 
ation within a population takes all possible pairs of strings and, as in (1) with probability 
exchanges the first components between them. One can now derive the second time 
step by writing 

P'{X^,X2) = ^[(l-p)l+pC2][(l-p)P°(xi,X2)+pP°(xi)P°(x2)] 

yi ,y2 

x[{l-p)P'{y^,y2)+pP'{y^)P'{y2)]. (4) 

In general one obtains for t > 2 

P\xr,X2) = (1 - X2) + [1 - (1 - p)*]P°(xi)P°(x2). (5) 

The distribution of strings in the gene pool at any time follows directly from the initial 
distribution. In general, for large N , crossover operates at different points of the strings 
and contributes N — 1 different terms at each time step. The expressions for the evolution 
with time become large quickly and a solution for general N is not readily obtained. 

Let us consider the recombination paths in one-point crossover. Choosing the crossover 
points at any position with equal probabilities we obtain for arbitrary N 

P'+\x,...xn) = {1-p)P\x,...xn) + j;^[P\x,)P\x2,...,xn) + 

P\xi, X2)P\x3, xn) + . . . + P\xi, xn-i)P\xn)]. (6) 

After several time steps, a given state may have many different possible origins via the 
different possible combinations of crossover operations leading to the same state. For the 
case = 4 this is shown in figure I. For p = 1 the different paths of crossover are shown in 
terms of sub-string probabilities. Note that these paths in general have different lengths. 
In this case, the shortest path reaches any state after two time steps. In the following 
we will take a closer look on just this optimal path in the evolution. It is the marked 
path in the middle where the crossover point is always chosen in the middle of any yet 
"untouched" (sub)string. The other paths to the right and left take one step longer. In 
principle they are redundant since the path in the middle is not only sufficient but even 
more economical. Below we will find that this branch leads to an appealing analytical form 
of the evolution equation. Furthermore we will pursue the idea to construct a crossover 
operator which omits the redundant terms. In other words, in each step we choose only 
the most effective crossover points from the repertoire of standard crossover. 

The dynamics are hard to depict, especially in a high dimensional phase space. For 
= 4, the neighborhood relations between different states are simple enough such that 
the basic idea can be seen in a two dimensional picture. This is shown in figure 2. Here, 
the phase space is shown in the second time step after starting with the initial states 
(0000) and (If If) at t = 0. Mutation only proceeds to next neighbors in each time step, 
here depicted as horizontal bars. States with many bits differing from the initial state are 
reached only in later steps. Crossover is able to scan phase space beyond next neighbors 
and reaches all states on the circle in the next picture. This skipping nature is known as 



3 



P(l,2,3,4) 



P(1)P(2,3,4) 



P(1,2)P(3,4) 
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P(1)P(2)P(3,4) P(1)P(2,3)P(4) P(1)P(2)P(3)P(4) P(1)P(2,3)P(4) P(1,2)P(3)P(4) 



P(1)P(2)P(3)P(4) P(1)P(2)P(3)P(4) 



P( 1)P(2)P(3)P(4) P( 1)P(2)P(3)P(4) 



Figure 1: Different crossover paths for = 4 

tlie speciai feature of crossover. Tlie tliird frame shows how hierarchical crossover scans 
phase space. It reaches the boxed states at t = 1 which is the maximally distant pair 
of states to the initial pair. It retains the feature of crossover omitting the overhead 
of redundant states at early times. In the analytical formulation, hierarchical crossover 
operates on strings of length = 2", n being an integer. For = 4 we obtain at t = 1 



P^{xi, X2, X3, X4) 



J2 [(1 - P)l + PC^] P^ixi, X2, X3, a;4)P°(j/i, j/2, J/3, 2/4) 

yi,y2,y3,y4: 

(1 - p)P°{xi, X2, X3, X4) + pP°{xi,X2)P°{x3, X4), 



(7) 



where chooses the crossover point in the middle of the strings. All total probabilities 
are normalized to 1. The next step is 



P'^{xi, X2, X3, X4) 



yi ,y2 ,y3 ,1/4 



X [(1 - p)P°{xi, X2, X3, X4) + pP°{xi,X2)P°{x3, X4)] 
X [(1 - p)P°(j/l, J/2, J/3, 2/4) + pP'iVl, J/2)P°(j/3, 2/4)]. 
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Figure 2: Phase space exploration at t = 1 for initial states (0000) and (1111) at t = 
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Here, the operator swaps the Xi and X3 components such that one obtains the partic- 
ularly simple form 

P'ixr,X2,Xs,X^) = il-pyP^ + il-p)pP2+pPl. (9) 

where we denote 

P4 = P°{xi,X2,X3,X4) (10) 

P2 = P°(X1,X2)P°(X3,X4) (11) 

Pi = P%X,)P'{X2)P'{XS)P'{X,). (12) 

Furthermore, we assume that the operator always returns an expression of type Pi, 
no matter which combination of P2 and P4 type expressions it operates on. Any further 
application of this crossover operator essentially increases the Pi term by decomposing 
more of the P2 and P4 terms, while acting as an identity on the pure Pi states such that, 
for t > 2, P*(a;i, a;2, X3, 2:4) is given by 

Pi = {i-py-^[{i-p)p, + pp,] + [i-{i-py-^]p,. (13) 

One can generalize this formalism to arbitrary dimensions N = 2"^ and obtains for t > n 

P*„ = (l_p)*-+l[(l_p)-lp2. + (l-p)'^-V2"-+...+pi'2] 

+ [l-(l-p)*-'^+^]Pi. (14) 

The main feature of this result is that after only n crossover steps the nth step produces 
a// possible states (with egwa/ probabilities if p = 1). The phase space exploration occurs 
in a hierarchical fashion using only the shortest possible path for each state. Any further 
evolution increases the density of this distribution. 

In order to obtain this behavior in practical simulations one chooses a chromosome 
of length 2". Furthermore, this formalism uses a modified crossover operator. The con- 
struction of this operator will be described in the following. In (9) when describing the 
evolution of a whole population it is easy to guess the result of any further application 
of the crossover operator C"*: The result is always proportional to Pi. In the case p = 1 
we simply have to apply the crossover operators (7^'^^ one after the other starting with 

at t = until one reaches for strings of the length = 2". If p 7^ 1, we have to 
be slightly more careful. Now, not the overall time t determines which operator we have 
to take, but rather the number of crossover operations that the individual strings have 
undergone so far. We introduce an individual "age tag" to every individual, denoting by 
which operator it has been produced. A string produced by has age 1, one from age 
2 etc. The prescription for the crossover procedure for one time step within a population 
is then the following: 

• For each pair in the population determine the minimum age a. 

• Apply the crossover operator (7^"^^ to each pair. The children are assigned the age 
a' = a -|- 1, where a is the smallest age tag of the two parents. 

• The children with a' = t -\- 1 are transferred to the next generations gene pool. 



5 



• The children with a' < t -\- 1 have to be processed further. Build all possible pairs 
from all children and group the pairs in different sub-populations of the same age 
tags of the pairs, e.g., pairs with ages (f ,f ), (1,2), (2,1), etc., except for pairs of the 
type (t+f,t+f). 

• Perform successive crossover within each sub-population until the children have all 
age t -\- 1. 

• Mix the members within each subpopulation between successive crossover steps. 

• Once the members of all sub-populations have reached age t -|- f , add them to the 
next generations gene pool. 

Some remarks are due concerning this modified crossover. First of all, when expressing 
this prescription in the earlier described formalism, one can show that it indeed corre- 
sponds to the desired behavior of the hierarchical crossover operator in (14). Furthermore, 
in the limit of large populations (which is the limit of the analytical equations), some of 
the operations within the sub-populations are just mimicking earlier operations, so in this 
limit, the procedure can be simplified further. 

In the following we present this algorithm in numerical simulations and compare it 
to mutation and one-point crossover, first, for a fiat fitness surface. The simulations 
follow the complete evolution of the phase space, starting from two maximally distant 
vectors (0, 0, 0) and (I, I, I). This simulation corresponds to the limit of very large 
populations in a regular genetic algorithm. This limit has been proven useful earlier to 
describe the average behavior of genetic algorithms with large populations [9]. 

In figure 3 the filling of the phase space for a small = 8 model is shown as the 
probabilities of all states, forming a "probability landscape" over the whole space. The 
leftmost squares shows the initial condition which is the same for all simulations with 
the two maximally distant vectors. The 256 states of the phase space are depicted in a 
16 by 16 field where the coordinates follow the decimal values of the leading 4 and least 
4 bits of the string, with the states coded as or I. I.e., the lower left corner of each 
square corresponds to (00000000), the upper right one to (IIIIIIII), and the upper left 
to (IIIIOOOO) and the lower right lower to (00001111). The maximum value on the Z 
axis corresponds to a probability of a state to occur of 0.5 (red) on a logarithmic scale to 
small values (blue), with being black. 

The upper row of figure 3 shows the evolution under mutation with mutation prob- 
ability 0.5. This diffusion-like process fills the space slowly beginning near the starting 
points. The exact diffusion type nature would be visible in an 8 dimensional picture, simi- 
lar to mutation in figure 2. The probability density is very inhomogeneous. The one-point 
crossover (with p=l) in the middle row of figure 3 performs better. Due to the recombi- 
nation, also distant points like the ones at the orthogonal corners are reached already at 
t = 2. The probability distribution is inhomogeneous by several orders of magnitude. In 
the lower part of figure 3 the evolution for hierarchic crossover is shown (for p = 1). The 
phase space is covered after 3 time steps and the probability distribution of the states 
produced in step 3 is homogeneous (for all states if p = I). This homogeneous covering of 
phase space was obtained by just omitting irrelevant operations from standard crossover 
procedures! 
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Figure 3: Evolution of = 8 states for times t = 0, . . . , 3 

What can we derive from the structure of these "probabihty landscapes" of genetic 
operators for practical problems with a non-flat fitness? In that case the "probability 
landscape" of the genetic operator interacts in a non-trivial way with the fitness landscape 
of the given problem. In order to investigate this, we will in the last part of this article 
apply this formalism to a problem with a more rugged fitness. For this purpose, we 
implement a small travelling salesman problem on an = 8 string and follow the complete 
evolution of the phase space. We choose 5 cities with one kept fixed and code the position 
on the tour of the remaining 4 cities in an 8 bit string. The tour length L is taken to be 

= - X] + I],k-i) (15) 

with li^k = I if city i is kth on the tour, else 0, and Dij being the distance between 
cities i and j . The energy function H contains the length plus penalty terms for multiple 
occurrence of cities and for preferring one direction 

H=L+\j:ii-j:i,,y+\9 (16) 

i k 

with ^ = 1 if the number of the first city is larger than the number of the last city, or else 
^ = 0. The fitness function / is chosen as 

/ = e-f^^ (17) 
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Figure 4: Travelling salesman with 5 cities under standard crossover, /3 = 0.1 

with a free parameter /3 to adjust its "ruggedness" . The evolution equations have to be 
modified to contain fitness. In particular, this requires a normalization to preserve the 
probabilities. The above evolution equations are to be multiplied by the fitness / and 
then normalized. For the simplest case = 2 in the first time step this is 

^''"'''^ E,„,,/(?/l,?/2)[(l-p)P°(j/l,J/2)+pP°(j/l)P°(j/2)] ^ ' 

and accordingly for the higher terms. For standard one-point crossover we used for the 
simulation the evolution equation 



fivm - P)ny) + ^ e£-i^ pm] ^ ^ 



where P{x)k denotes P{xi^ Xk)P{xk+i^ x^). 

A simulation of the full evolution under these equations is more than the run of a 
genetic algorithm: It determines the expected average evolution of a genetic algorithm 
with a large population. In figures 4 and 5 the results of the simulations are shown. Figure 
4 shows how standard one-point crossover solves the problem. The fitness function of the 
travelling salesman problem has been chosen s.t. the fitness optimum lies in one of the 
less dense regions of the probability landscape of one-point crossover. The simulations 
show that in this case the coincidence of a rugged fitness landscape of the problem with 
a rugged probability landscape of the genetic operator, sharply increases the probability 
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Figure 5: Travelling salesman, 5 cities, under hierarchic crossover, /3 = 10 

of premature convergence towards a false minimum of the energy function. Only for a 
very smooth parametrization of the fitness landscape (/3 < 0.1) the evolution converged 
towards the right solution. In that case, it takes about 20 generations for the right 
solution to appear. Larger values of /3 like 1 or 10 force premature convergence into false 
maxima. This results from the very inhomogeneous probabilities of the states produced 
by crossover. In figure 5 one can see that hierarchic crossover solves the problem even for 
very inhomogeneous fitness landscape (/3 = 10) already in the third time step. It turns 
out to be very robust against steep fitness functions. 

In this article, we derived a simple model for crossover. The discussion of the time 
evolution of this model (over more than one generation) lead us to a simple variant with 
an improved exploration behavior in phase space. This model is analytically solvable in 
the fiat fitness case. We developed the concept of the "probability landscape" of a genetic 
operator as opposed to the "fitness landscape" of the underlying problem. Numerical 
simulations suggest that genetic operators with a homogeneous probability landscape 
are more robust against premature convergence. Inhomogeneities in both, probability 
and fitness landscapes, appear to favor premature convergence. The next steps in this 
investigation include the generalization of the formalism to the case with nontrivial fitness 
functions. Although a hopeless task for general models of crossover, this might be more 
feasible for the model of hierarchical crossover, at least in some special situations as, e.g., 
in the statistical case of random fitnesses. Furthermore, it has to be explored, how the 
advantages of the hierarchical crossover operator translate to finite size populations and 
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large search spaces. The goal of this approach is a better understanding of the dynamics 
of genetic algorithms. 
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